Confidence Guided Progressive Search and Fast Match Techniques for High Performance Chinese/English OCR

نویسندگان

  • Zhi-Dan Feng
  • Qiang Huo
چکیده

In the past several years, we’ve been developing a high performance OCR engine for machine printed Chinese/English documents. In this paper, we present two innovative techniques that contribute to the high efficiency in recognition of the mixed Chinese/English text line. They are (1) a progressive search strategy based on character verification, and (2) a tree-based fast match technique with a confidence-guided adaptive stopping mechanism. The efficacy of the proposed techniques is confirmed by experiments in a benchmark test.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Chinese/English OCR Performance by Using MCE-based Character-Pair Modeling and Negative Training

In the past several years, we’ve been developing a high performance OCR engine for machine printed Chinese/English documents. We have reported previously (1) how to use character modeling techniques based on MCE (minimum classification error) training to achieve the high recognition accuracy, and (2) how to use confidence-guided progressive search and fast match techniques to achieve the high r...

متن کامل

Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model

We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the ...

متن کامل

Chinese and Korean Topic Search of Japanese News Collections

UC Berkeley participated in the pivot bilingual task of the CLIR track at NTCIR Workshop 4. Our focus was on Chinese and Korean searches against the Japanese News document collection, using English as a pivot language. For comparison of our pivot techniques, we submitted Japanese monolingual and English Japanese bilingual search rankings as well. Two different commercial translation software pa...

متن کامل

Comparing the Effects of Progressive Muscle Relaxation and Guided Imagery on sleep quality in primigravida women referring to Mashhad health care centers-1393

Background & aim: Decreased sleep quality is a common complaint during pregnancy. Relaxation is one of the non-pharmaceutical treatments for sleep disorders. Different techniques could have different impacts on various biological and mental stressors. Therefore, this study aimed to compare the effects of progressive muscle relaxation and guided imagery on the sleep quality of primigravida women...

متن کامل

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

We’ve been developing a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kai, He, Yuan, LiShu, WeiBei, Xin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002